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Abstract. In the data mining field many clustering methods have been 
proposed, yet standard versions do not take into account uncertain databases. 
This paper deals with a new approach to cluster uncertain data by using 
a hierarchical clustering defined within the belief function framework. 

The main objective of the belief hierarchical clustering is to allow an 
object to belong to one or several clusters. To each belonging, a degree 
of belief is associated, and clusters are combined based on the pignistic 
properties. Experiments with real uncertain data show that our proposed 
method can be considered as a propitious tool. 
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1 Introduction 

Due to the increase of imperfect data, the process of decision making is becoming 
harder. In order to face this, the data analysis is being applied in various fields. 

Clustering is mostly used in data mining and aims at grouping a set of similar 
objects into clusters. In this context, many clustering algorithms exist and are 
categorized into two main families: 

The first family involves the partitioning methods based on density such as k- 
means algorithm that is widely used thanks to its convergence speed. It par¬ 
titions the data into k clusters represented by their centers. The second family 
includes the hierarchical clustering methods such as the top-down and the Hi¬ 
erarchical Ascendant Clustering (HAC) [5]. This latter consists on constructing 
clusters recursively by partitioning the objects in a bottom-up way. This process 
leads to good result visualizations. Nevertheless, it has a non-linear complexity. 

All these standard methods deal with certain and precise data. Thus, in 
order to facilitate the decision making, it would be more appropriate to handle 
uncertain data. Here, we need a soft clustering process that will take into account 
the possibility that objects belong to more than one cluster. 

In such a case, several methods have been established. Among them, the 
Fuzzy C-Means ^ which consists in assigning a membership to each data point 
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corresponding to the cluster center, and the weights minimizing the total weighted 
mean-square error. This method constantly converges. Patently, Evidential c- 
Means (ECM) [3], [7] is deemed to be a very fateful method. It enhances the 
ECM and generates a credal partition from attribute data. This method deals 
with the clustering of object data. Accordingly, the belief /c-Modes method [1] is 
a popular method, which builds K groups characterized by uncertain attribute 
values and provides a classification of new instances. Schubert has also found a 
clustering algorithm [8] which uses the mass on the empty set to build a classifier. 

Our objective in this paper is to develop a belief hierarchical clustering 
method, in order to ensure the membership of objects in several clusters, and to 
handle the uncertainty in data under the belief function framework. 

This remainder is organized as follows: in the next section we review the 
ascendant hierarchical clustering, its concepts and its characteristics. In section 
3, we recall some of the basic concepts of belief function theory. Our method 
is described in section 4 and we evaluate its performance on a real data set in 
section 5. Finally, Section 6 is a conclusion for the whole paper. 


2 Ascendant hierarchical clustering 

This method consists on agglomerating the close clusters in order to have finally 
one cluster containing all the objects Xj (where j = 

Let’s consider = {Ci,..., Ck} the set of clusters. UK = N ,C\ = xi,..., Cn = 
Xn- Thereafter, throughout all the steps of clustering, we will move from a par¬ 
tition to a partition V^~^. The result generated is described by a hierar¬ 
chical clustering tree (dendrogram), where the nodes represent the successive 
fusions and the height of the nodes represents the value of the distance between 
two objects which gives a concrete meaning to the level of nodes conscripted 
as ’’indexed hierarchy”. This latter is usually indexed by the values of the dis¬ 
tances (or dissimilarity) for each aggregation step. The indexed hierarchy can 
be seen as a set with an ultrametric distance d which satisfies these properties: 
i) x = y d{x, y) = 0. 
a) d{x,y) = d{y,x). 

in) d{x, y) < d{x, z) + d{y, z),\/x, y, z € HI. 

The algorithm is as follows: 

— Initialisation: the initial clusters are the N-singletons. We compute their 
dissimilarity matrix. 

~ Iterate these two steps until the aggregation turns into a single cluster: 

• Combine the two most similar (closest) elements (clusters) from the se¬ 
lected groups according to some distance rules. 

• Update the matrix distance by replacing the two grouped elements by 
the new one and calculate its distance from each of the other classes. 

Once all these steps completed, we do not recover a partition of K clusters, 
but a partition of AT — 1 clusters. Hence, we had to point out the aggregation 
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criterion (distance rules) between two points and between two clusters. We can 
use the Euclidian distance between N objects x defined in a space K. Different 
distances can be considered between two clusters: we can consider the minimum 
as follows: 

d(C'*,C'],)= min d{xk,Xk>) (1) 

with j,j' = 1, i. The maximum can also be considered, however, the minimum 
and maximum distances create compact clusters but sensitive to ’’outliers”. The 
average can also be used, but the most used method is Ward’s method, using 
Huygens formula to compute this: 


^^inter{Cj ) 


mcjmc., 

mcj + me-, 




( 2 ) 


where mc^ and mc-, are numbers of elements of Cj and Cj, respectively and 
Cj, Cj, the centers. Then, we had to find the couple of clusters minimizing the 
distance: 


{Cf,Cf,) = d{ClCl,) 


min d(C'),Ch) 
q.c;,GC- 


(3) 


3 Basis on the theory of belief functions 


In this Section, we briefly review the main concepts that will be used in our 
method that underlies the theory of belief functions [5] as interpreted in the 
Transferable Belief Model (TBM) [10]. Let’s suppose that the frame of discern¬ 
ment is 17 = {a;i,a; 2 ,..., W 3 }. 17 is a finite set that reflects a state of partial 
knowledge that can be represented by a basis belief assignment defined as: 

m : 2 ^ ^ [ 0 , 1 ] 

^ m{A) = 1 (4) 

Aen 


The value m{A) is named a basic belief mass (bbm) of A. The subset A € 2^ is 
called focal element if m{A) > 0. One of the important rules in the belief theory 
is the conjunctive rule which consists on combining two basic belief assignments 
mi and m 2 induced from two distinct and reliable information sources defined 
as: 

mi(n)m2(C') = ^ mi(A)-m2(B), VC C 17 (5) 

AnB=c 

The Dempster rule is the normalized conjunctive rule: 

In order to ensure the decision making, beliefs are transformed into proba¬ 
bility measures recorded BetP, and defined as follows HUj: 


BetP(A) = 


E 

Bcn 


\Ar\B 


m{B) 

(1 - m( 0 )) 


VA e 17 


(7) 
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4 Belief hierarchical clustering 

In order to set down a way to develop a belief hierarchical clustering, we choose 
to work on different levels: on one hand, the object level, on the other hand, the 
cluster level. At the beginning, for N objects we have, the frame of discernment is 
17 = {xi ,..., zat} and for each object belonging to one cluster, a degree of belief 
is assigned. Let be the partition of N objects. Hence, we define a mass 
function for each object Xi, inspired from the /c-nearest neighbors [ 2 ] method 
which is dehned as follows: 

where i j, a and 7 are two parameters we can optimize d can be con¬ 
sidered as the Euclidean distance, and the frame of discernment is given by 

f2i = {xi, ...,a:Ar}\{a;i}. 

In order to move from the partition of N objects to a partition of iV — I 
objects we have to find both nearest objects {xi,Xj) to form a cluster. Even¬ 
tually, the partition of — 1 clusters will be given by = {(xi,Xj),Xk} 

where k = 1 ,..., N\ {i,j}. The nearest objects are found considering the pignis- 
tic probability, defined on the frame f2i, of each object Xi, where we proceed the 
comparison by pairs, by computing firstly the pignistic for each object, and then 
we continue the process using argmax. The nearest objects are chosen using the 
maximum of the pignistic values between pairs of objects, and we will compute 
the product pair one by one. 

{xi,Xj) = argmax BetP{^‘(a;j) *BetP^^(a:i) (9) 

Then, this hrst couple of objects is a cluster. Now consider that we have a 
partition of K clusters {Ci,..., Cat}. In order to find the best partition 
'P^~^ of AT — I clusters, we have to find the best couple of clusters to be merged. 
First, if we consider one of the classical distances d (single link, complete link, 
average, etc), presented in section]^ between the clusters, we delineate a mass 
function, dehned within the frame 17^ for each cluster Ci G with Ci ^ Cj 
by: 


mf’(Cj) = (10) 

where f2i = {Ci,..., Ck} \ {Ci}. Then, both clusters to merge are given by: 

{Ci,Cj) = argmax BetP^‘(C'j) * BetP^^(C'i) (12) 


and the partition ^ is made from the new cluster (Ci,Cj) and all the other 
clusters of . The point by doing so is to prove that if we maximize the 
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degree of probability we will have the couple of clusters to combine. Of course, 
this approach will give exactly the same partitions than the classical ascendant 
hierarchical clustering, but the dendrogram can be built from BetP and the best 
partition (ie. the number of clusters) can be preferred to find. The indexed 
hierarchy will be indexed by the sum of BetP which will lead to more precise 
and specific results according to the dissimilarity between objects and therefore 
will facilitate our process. 

Hereafter, we define another way to build the partition . For each initial 

object Xi to classify, it exists a cluster of such as Xi S Ck- We consider the 
frame of discernment = {Ci,..., Ck} \ {Ck}, which describes the degree 
that the two clusters could be merged, can be noted m^and we define the mass 
function: 




iCk,)= n 




m, (Ofc, )= II ae 

Xj^Ckj 

Xj G Ck -• 


(13) 

(14) 


In order to find a mass function for each cluster Ci of , we combine all 
the mass functions given by all objects of Q by a combination rule such as 
the Dempster rule of combination given by equation ®. Then, to merge both 
clusters we use the equation (12) as before. The sum of the pignisitic probabilities 


will be the index of the dendrogram, called BetP index. 


5 Experimentations 

Experiments were first applied on diamond data set composed of twelve objects 
as describe in Figure [^a and analyzed in [7]. The dendrograms for both classical 
and Belief Hierarchical Clustering (BHC) are represented by Figures[T]b and[^c. 
The object 12 is well considered as an outlier with both approaches. With the 
belief hierarchical clustering, this object is clearly different, thanks to the pig- 
nistic probability. For HAC, the distance between object 12 and other objects 
is small, however, for BHC, there is a big gap between object 12 and others. 
This points out that our method is better for detecting outliers. If the objects 
5 and 6 are associated to 1, 2, 3 and 4 with the classical hierarchical clustering, 
with BHC these points are more identified as different. This synthetic data set 
is special because of the equidistance of the points and there is no uncertainty. 
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a. Diamond data set 


Cluster Dendrogram 



Cluster Dendrogram 


b. Hierarchical clnstering c. Belief hierarchical clustering 

Fig. 1. Clustering results for Diamond data set. 


We continue our experiments with a well-known data set, Iris data set, which 
is composed of flowers from four types of species of Iris described by sepal length, 
sepal width, petal length, and petal width. The data set contains three clusters 
known to have a significant overlap. 

In order to reduce the complexity and present distinctly the dendrogram, we 
first used the fc-means method to get initial few clusters for our algorithm. It is 
not necessary to apply this method if the number of objects is not high. 

Several experiments have been used with several number of clusters. We 
present in Figure the obtained dendrograms for 10 and 13 clusters. We notice 
different combinations between the nearest clusters for both classical and belief 
hierarchical clustering. The best situation for BHC is obtained with the pignistic 
equal to 0.5 because it indicates that the data set is composed of three significant 
clusters which reflects the real situation. For the classical hierarchical clustering 
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Cluster Dendrogram 



a. Kir,u = 10 for HAC 

Cluster Dendrogram 


Cluster Dendrogram 



b. K,r,it = 13 for HAC 

Cluster Dendrogram 



C. Ki^it = 10 for BHC d. Ki^a = 13 for BHC 


Fig. 2. Clustering results on IRIS data set for both hierarchical (HAC) (Fig. a and b) 
and belief hierarchical (BHC) (Fig. c and d) clustering {Kinit is the cluster number by 
fc-means first). 


the results are not so obvious. Indeed, for HAC, it is difficult to decide for the 
optimum cluster number because of the use of the euclidean distance and as seen 
in Figure 2.c it indistinguishable in terms of the y-value. However, for BHC, it 
is more easy to do this due to the use of the pignistic probability. 

In order to evaluate the performance of our method, we use some of the most 
popular measures: precision, recall and Rand Index (RI). The results for both 
BHC and HAC are summarized in Table I. The first three columns are for BHC, 
while the others are for HAC. In fact, we suppose that Fc represents the final 
number of clusters and we start with Fc = 2 until Fc = 6. We fixed the value 
of kinit at 13. We note that for Fc = 2 the precision is low while the recall is 
of high value, and that when we have a high cluster number (F). = 5 or 6), the 
precision will be high but the recall will be relatively low. Thus, we note that for 
the same number of final clusters {e.g. Fc = 4), our method is better in terms of 
precision, recall and RI. 





























































Belief Hierarchical Clustering 


Table 1. Evaluation results 


BHC 


HAC 


Precision Recall RI Precision Recall RI 


Ec = 2 0.5951 1.0000 0.7763 

Ec = 3 0.8011 0.8438 0.8797 

Ec = 4 0.9506 0.8275 0.9291 

Ec = 5 0.8523 0.6063 0.8360 

Ec = 6 0.9433 0.5524 0.8419 


0.5951 1.0000 0.7763 
0.6079 0.9282 0.7795 
0.8183 0.7230 0.8561 
0.8523 0.6063 0.8360 
0.8916 0.5818 0.8392 


Tests are also performed to a third data base, Congressional Voting Records 
Data Set. The results presented in Figure show that the pignistic probability 
value increased at each level, having thereby, a more homogeneous partition. 
We notice different combinations, between the nearest clusters, that are not the 
same within the two methods compared. For example, cluster 9 is associated to 
cluster 10 and then to 6 with HAC, but, with BHC it is associated to cluster 4 
and then to 10. Although, throughout the BHC dendrograms shown in Figure|^c 
and Figurej^d, the best situation indicating the optimum number of clusters can 
be clearly obtained. This easy way is due to the use of the pignistic probability. 


Cluster Dendrogram 


Cluster Dendrogram 



a. Kinit = 10 for HAC 

Cluster Dendrogram 



C. Ki^it = 10 for BHC 



b. Kir,it = 13 for HAC 

Cluster Dendrogram 



d. Ki^it = 13 for BHC 


Fig. 3. Clustering results on Congressional Voting Records Data Set for both hierar¬ 
chical and belief hierarchical clustering. 
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For this data set, we notice that for Fc = 2 and 3, the precision is low while the 
recall is high. However, with the increasing of our cluster number, we notice that 
BHC provides a better results. In fact, for Fc = 3,4,5 and 6 the precision and 
RI values relative to BHC are higher then the precision and RI values relative 
to HAC, which confirmed the efficiency of our approach which is better in terms 
of precision and RI. 


Table 2. Evaluation results for Congressional Voting Records Data Set 


BHC 


HAC 


Precision Recall RI Precision Recall RI 


Fc = 2 0.3873 0.8177 0.5146 

F: = 3 0.7313 0.8190 0.8415 

F = 4 0.8701 0.6833 0.8623 

F = 5 0.8670 0.6103 0.8411 

F = 6 0.9731 0.6005 0.8632 


0.5951 1.0000 0.7763 
0.6288 0.8759 0.7892 
0.7887 0.7091 0.8419 
0.7551 0.6729 0.8207 
0.8526 0.6014 0.8347 


6 Conclusion 

Ultimately, we have introduced a new clustering method using the hierarchical 
paradigm in order to implement uncertainty in the belief function framework. 
This method puts the emphasis on the fact that one object may belong to several 
clusters. It seeks to merge clusters based on its pignistic probability. Our method 
was proved on data sets and the corresponding results have clearly shown its 
efficiency. The algorithm complexity has revealed itself as the usual problem 
of the belief function theory. Our future work will be devoted to focus on this 
peculiar problem. 
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